Exploratory Data Analysis for U.S. Counties

Author: Sultan Abdullah

Last Updated: 3/30/2020

Description: Initial investigations on COVID-19 Counties' data so as to discover patterns, spot anomalies, test hypothesis and check assumptions with the help of summary statistics and graphical representations.

Importing Libraries

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from matplotlib import style
style.use('ggplot')

Reading Data

In [2]:
os.chdir(r"C:\Users\Sultan\Documents\GitHub\covid-19-data-analysis\util\data")
In [3]:
df = pd.read_csv('us-counties.csv')
In [4]:
df.head()
Out[4]:
date county state fips cases deaths
0 2020-01-21 Snohomish Washington 53061.0 1 0
1 2020-01-22 Snohomish Washington 53061.0 1 0
2 2020-01-23 Snohomish Washington 53061.0 1 0
3 2020-01-24 Cook Illinois 17031.0 1 0
4 2020-01-24 Snohomish Washington 53061.0 1 0
In [5]:
# Let us discribe the dataframe 
df.describe()
Out[5]:
fips cases deaths
count 17452.000000 17731.000000 17731.000000
mean 28856.508538 33.925611 0.529525
std 15795.409823 446.659162 7.867592
min 1001.000000 1.000000 0.000000
25% 16001.000000 1.000000 0.000000
50% 28069.000000 3.000000 0.000000
75% 42064.000000 9.000000 0.000000
max 56043.000000 30766.000000 672.000000
In [6]:
df.shape
Out[6]:
(17731, 6)

Visulize the Data

In [7]:
# Use only for the first time
# !pip install plotly

# In case you have any issues, run this for only once
# !conda install -c plotly plotly-orca

import plotly.express as px

fig = px.bar(df, x='date', y='cases', color='county', labels={'y':'cases'},
             hover_data=['county'],
             title='Evolution of Reported COVID-19 Cases in the United States Counties')
fig.write_image('../img/evolution-covid-19-cases-counties.png')
fig.show()
In [8]:
fig = px.bar(df, x='date', y='deaths', color='county', labels={'y':'cases'},
             hover_data=['county'],
             title='Evolution of Reported COVID-19 Deaths in the United States Counties')
fig.write_image('../img/evolution-covid-19-deaths-counties.png')
fig.show()
In [9]:
# Tree Map Visualization of COVID-19 Death Cases by County and Date
fig = px.treemap(df.sort_values(by='cases', ascending=False).reset_index(drop=True), 
                 path=["county", "date"], values="deaths", height=700,
                 title='Number of deaths from COVID-19 by County and Date',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
fig.write_image('../img/treemap-of-covid-19-cases-counties.png')
fig.show()
In [10]:
# Tree Map Visualization of COVID-19 Death Cases by County and Date
fig = px.treemap(df.sort_values(by='deaths', ascending=False).reset_index(drop=True), 
                 path=["county", "date"], values="deaths", height=700,
                 title='Number of deaths from COVID-19 by County and Date',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
fig.write_image('../img/treemap-of-covid-19-deaths-counties.png')
fig.show()